 |
International Chemical Identifier Totally Explained
|
|  |
|
NEW! |
All the latest news in the worlds of
computer gaming,
entertainment,
the environment,
finance,
health,
politics,
science,
stocks & shares,
technology
and much,
much,
more.
|
Everything about International Chemical Identifier totally explainedThe IUPAC International Chemical Identifier ( InChI, pronounced "INchee") is a textual identifier for chemical substances, designed to provide a standard and human-readable way to encode molecular information and to facilitate the search for such information in databases and on the web. Developed by IUPAC and NIST during 2000-2005, the format and algorithms are non-proprietary and the software is freely available under the open source LGPL license (though the term "InChI" is a trademark of IUPAC).
Overview
The identifiers describe chemical substances are described in terms of layers of information — the atoms and their bond connectivity, tautomeric information, isotope information, stereochemistry, and electronic charge information.
Not all layers have to be provided; for instance, the tautomer layer can be omitted if that type of information isn't relevant to the particular application. Information about the 3-dimensional coordinates of atoms isn't represented in InChI.
InChIs differ from the widely used CAS registry numbers in three respects:
- they're freely usable and non-proprietary;
- they can be computed from structural information and don't have to be assigned by some organization;
- most of the information in an InChI is human readable (with practice).
InChIs can thus be seen as akin to a general and extremely formalized version of IUPAC names.
The InChI algorithm converts input structural information into a unique InChI identifier in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique number label for each atom), and serialization (to give a string of characters).
The InChIKey, sometimes referred to as a hashed InChI, is a fixed length (25 character) condensed digital representation of the InChI that isn't human-readable. The InChIKey specification was released in September 2007 in order to facilitate web searches for chemical compounds, since these were problematical with the full-length InChI.
Examples
CH3CH2OH ethanol |
InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3 |
L-ascorbic acid |
InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1 |
Format and layers
Every InChI starts with the string "InCHI=" followed by the version number, currently 1. The remaining information is structured as a sequence of layers and sub-layers, with each layer providing one specific type of information. The layers and sub-layers are separated by the delimiter "/" and start with a characteristic prefix letter (except for the chemical formula sub-layer of the main layer). The six layers with important sublayers are:
Main layer
- Chemical formula (no prefix). This is the only sublayer that must occur in every InChI.
- Atom connections (prefix: "c"). The atoms in the chemical formula (except for hydrogens) are numbered in sequence; this sublayer describes which atoms are connected by bonds to which other ones.
- Hydrogen atoms (prefix: "h"). Describes how many hydrogen atoms are connected to each of the other atoms.
Charge layer
- positive charge sublayer (prefix: "p")
- negative charge sublayer (prefix: "q")
Stereochemical layer
Isotopic layer
Fixed-H layer
Reconnected Layer
The delimiter-prefix format has the advantage that a user can easily use a wildcard search to find identifiers that match only in certain layers.
InChIKey
The condensed, 25 character InChIKey is a hashed version of the full InChI (using the SHA-256 algorithm), designed to allow for easy web searches of chemical compounds. Most chemical structures on the Web up to 2007 have been represented as GIF files, which are not searchable for chemical content. The full InChI turned out to be too lengthy for easy searching, and therefore the InChIKey was developed. There is a very small, but finite chance of two different molecules having the same InChIKey, but the probability for duplication of only the first 14 characters has been estimated as only one duplication in 75 databases each containing one billion unique structures. With all databases currently having below 50 million structures, such duplication appears unlikely at present.
InChIKeys consist of 14 characters resulting from a hash of the connectivity information of the InChI, followed by a hyphen, followed by 8 characters resulting from a hash of the remaining layers of the InChI, followed by a single character indication the version of InChI used, followed by single checksum character.
Example: Morphine has the structure shown on right. The InChI for morphine is InChI=1/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11-,13-,16-,17-/m0/s1
and the InChIKey for morphine is BQJCRHHNABKAKU-XKUOQXLYBY.
Name
The format was originally called IChI (IUPAC Chemical Identifier), then renamed in July 2004 to INChI (IUPAC-NIST Chemical Identifier), and renamed again in November 2004 to InChI (IUPAC International Chemical Identifier), a trademark of IUPAC.
Further Information
Get more info on 'International Chemical Identifier'.
|
External Link Exchanges
Do you know how hard it is to get a link from a large encyclopaedia? Well we're different and will prove it. To get a link from us just add the following HTML to your site on a relevant page:
<a href="http://international_chemical_identifier.totallyexplained.com">International Chemical Identifier Totally Explained</a>
Then simply click through this link from your web page. Our crawlers will verify your link, extract the title of your web page and instantly add a link back to it. If you like you can remove the words Totally Explained and embed the link in article text.
As long as your link remains in place, we'll keep our link to you right here. Please play fair - our crawlers are watching. Your site must be closely related to this one's topic. Any kind of spamming, dubious practises or removing the link will result in your link from us being dropped and, potentially, your whole site being banned. |
|
|